.Dd December 5, 2023
.Dt LLAMAFILE-IMATRIX 1
.Os Llamafile Manual
.Sh NAME
.Nm llamafile-imatrix
.Nd importance matrix builder
.Sh SYNOPSIS
.Nm
.Op flags...
.Fl m Ar model.gguf
.Fl f Ar training.data
.Op Fl o Ar imatrix.dat
.Sh DESCRIPTION
.Nm
Compute an importance matrix for a model and given text dataset. Can be
used during quantization to enchance the quality of the quantum models.
More information is available here:
https://github.com/ggerganov/llama.cpp/pull/4861
.Sh OPTIONS
The following options are available:
.Bl -tag -width indent
.It Fl Fl version
Print version and exit.
.It Fl h , Fl Fl help
Show help message and exit.
.It Fl m Ar FNAME , Fl Fl model Ar FNAME
Model path in the GGUF file format.
.Pp
Default:
.Pa models/7B/ggml-model-f16.gguf
.It Fl f Ar FNAME , Fl Fl file Ar FNAME
Mandatory path of file containing training data, e.g.
.Pa wiki.train.raw
.It Fl o Ar FNAME , Fl Fl output-file Ar FNAME
The name of the file where the computed data will be stored. If this
flag is missing then
.Pa imatrix.dat
is used.
.It Fl ofreq , Fl Fl output-frequency
Specifies how often the so far computed result is saved to disk. The
default is is 10 (i.e., every 10 chunks).
.It Fl ow , Fl Fl output-weight
Specifies if data will be collected for the
.Pa output.weight
tensor. Experience indicates that it is better to not utilize the
importance matrix when quantizing
.Pa output.weight ,
so this is set to false by default.
.It Fl Fl chunks Ar N
Max number of chunks to process.
.Pp
.Bl -dash -compact
.It
-1 = all
.El
.Pp
Default: -1
.El
.Sh PROTIPS
For faster computation, pass the
.Fl ngl Ar 9999
flag for GPU offloading.
.Sh SEE ALSO
.Xr llamafile 1 ,
.Xr llamafile-quantize 1
